common : convert string contents to arrays if template requires typed content #19156

ownia · 2026-01-28T10:02:17Z

I encountered an error while testing #18825 (on the latest master branch):

(.venv) ➜  llama.cpp git:(master) ✗ ./build/bin/llama-cli -m PaddleOCR-VL-GGUF.gguf \
  --mmproj PaddleOCR-VL-GGUF-mmproj.gguf \
  --color on \
  --image test.jpg \
  --prompt "OCR:" \
  --reasoning-budget 0
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.010 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   Apple M1
ggml_metal_device_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 12713.12 MB

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7852-eef375ce1
model      : PaddleOCR-VL-GGUF.gguf
modalities : text, vision

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file
  /image <file>       add an image file

Loaded media from 'test.jpg'

> OCR:

WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.
See: https://github.com/ggml-org/llama.cpp/pull/17869
0   libggml-base.0.9.5.dylib            0x000000010088136c ggml_print_backtrace + 276
1   libggml-base.0.9.5.dylib            0x0000000100895ec8 _ZL23ggml_uncaught_exceptionv + 12
2   libc++abi.dylib                     0x000000018f594c2c _ZSt11__terminatePFvvE + 16
3   libc++abi.dylib                     0x000000018f598648 __cxa_increment_exception_refcount + 0
4   llama-cli                           0x00000001002e12d0 _ZN5jinja9statement7executeERNS_7contextE + 140
5   llama-cli                           0x000000010023862c _ZN5jinja7runtime7executeERKNS_7programE + 172
6   llama-cli                           0x0000000100237cf4 _ZL5applyRK20common_chat_templateRK16templates_paramsRKNSt3__18optionalIN8nlohmann16json_abi_v3_12_010basic_jsonINS8_11ordered_mapENS5_6vectorENS5_12basic_stringIcNS5_11char_traitsIcEENS5_9allocatorIcEEEEbxydSF_NS8_14adl_serializerENSB_IhNSF_IhEEEEvEEEESO_SO_ + 2300
7   llama-cli                           0x0000000100236e80 _ZL37common_chat_params_init_without_toolsRK20common_chat_templateRK16templates_params + 112
8   llama-cli                           0x000000010022b47c _ZL33common_chat_templates_apply_jinjaPK21common_chat_templatesRK28common_chat_templates_inputs + 18976
9   llama-cli                           0x000000010010cd48 _ZN11cli_context11format_chatEv + 308
10  llama-cli                           0x0000000100106454 _ZN11cli_context19generate_completionER14result_timings + 80
11  llama-cli                           0x00000001001051f8 main + 4912
12  dyld                                0x000000018f219d54 start + 7184
libc++abi: terminating due to uncaught exception of type jinja::rethrown_exception:
------------
While executing For at line 14, column 13 in source:
...%}↵        {{- "User: " -}}↵        {%- for content in message["content"] -%}↵  ...
                                           ^
Error: Expected iterable or object type in for loop: got String
[1]    53856 abort      ./build/bin/llama-cli -m PaddleOCR-VL-GGUF.gguf --mmproj  --color on --image

The original chat template is at https://huggingface.co/PaddlePaddle/PaddleOCR-VL/blob/main/chat_template.jinja
I use git bisect to find this commit (6df686b) introduced a change which cleaned up the chatml fallback and it is the first bad commit. I think the root cause is that jinja templates will fail when they receive plain strings during the content array indexing (for llama-cli case). So this PR will detect templates that expect typed/array-style message content and convert string contents into a typed-content array if requires_typed_content.

… content Signed-off-by: Weizhao Ouyang <[email protected]>

ngxson

This may not be a good fix. Formatting can fail for multiple reasons, so assuming it fails due to typed content will break other templates.

Instead, we should have a system to enable a cap, then retry the formatting to verify if it actually works. I'll have a look on that later

ownia · 2026-01-28T10:16:38Z

Yeah, this is a quick fix that I've validated locally, we should be cautious about template formatting. Feel free to modify my patch.

CISC · 2026-01-28T10:21:48Z

Funnily enough it's not illegal to pass a string to a for loop in jinja2 (since strings are both sequence and iterable), it just doesn't give any reasonable output, I think I prefer it to fail like we do as it is most certainly a bug to do so.

Edit: Reasonable as in every single character of the string, which hopefully no chat template will use/expect to work.

ownia · 2026-01-28T10:44:55Z

Yes, I think the current error message is reasonable. I'm just thinking about how to make llama-cli more flexible (and stable) in accepting template, rather than attempting to update the original template.

pwilkin · 2026-01-28T12:29:19Z

It's a templating problem. I'm working on it in the autoparser branch. Basically, for some templates we need to vary whether we treat an unquoted value in a field as a string or as a JSON object based on the schema for the tool call, i.e.

<arg name=baz>
foo
</arg>

is a string when baz is typed as a string and

<arg name=baz>
["foo"]
</arg>

should be a JSON array (not a string "["foo"]") when it's typed as an array.

pwilkin · 2026-01-28T12:31:46Z

Oh, sorry, I misread, this is about converting content, not tool call args. Well, the general problem is the same: we need to be able to determine which one we should parse. The key thing is when streaming, you cannot just "change your mind" at some point because the delta will be invalid, so you have to actually know in advance whether the content you'll be getting is an array or a string.

common : convert string contents to arrays if template requires typed…

e932375

… content Signed-off-by: Weizhao Ouyang <[email protected]>

ownia requested review from CISC, aldehir, ngxson and pwilkin as code owners January 28, 2026 10:02

ownia changed the title ~~common : convert string contents to arrays if template requires typed…~~ common : convert string contents to arrays if template requires typed content Jan 28, 2026

ngxson requested changes Jan 28, 2026

View reviewed changes

loci-dev mentioned this pull request Jan 28, 2026

UPSTREAM PR #19156: common : convert string contents to arrays if template requires typed content auroralabs-loci/llama.cpp#1061

Open

github-actions bot added the jinja parser Issues related to the jinja parser label Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common : convert string contents to arrays if template requires typed content #19156

common : convert string contents to arrays if template requires typed content #19156

ownia commented Jan 28, 2026

Uh oh!

ngxson left a comment

Uh oh!

ownia commented Jan 28, 2026

Uh oh!

CISC commented Jan 28, 2026 •

edited

Loading

Uh oh!

ownia commented Jan 28, 2026 •

edited

Loading

Uh oh!

pwilkin commented Jan 28, 2026

Uh oh!

pwilkin commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

common : convert string contents to arrays if template requires typed content #19156

Are you sure you want to change the base?

common : convert string contents to arrays if template requires typed content #19156

Conversation

ownia commented Jan 28, 2026

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

ownia commented Jan 28, 2026

Uh oh!

CISC commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ownia commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Jan 28, 2026

Uh oh!

pwilkin commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CISC commented Jan 28, 2026 •

edited

Loading

ownia commented Jan 28, 2026 •

edited

Loading